{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "j9C88y8XBOP1",
   "metadata": {
    "id": "j9C88y8XBOP1"
   },
   "source": [
    "# Using NVIDIA NIM with Elasticsearch vector store"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cCCWL6TTBX0H",
   "metadata": {
    "id": "cCCWL6TTBX0H"
   },
   "source": [
    "Integrating Elasticsearch and NVIDIA NIM using advanced LLM models, significantly boosts applications with natural language processing capabilities. Built on NVIDIA's software stack, which includes CUDA and TensorRT, NVIDIA NIM offers features such as in-flight batching. This technique speeds up request processing by handling multiple iterations of execution simultaneously, while integrating seamlessly with Elasticsearch to enhance data indexing and search functionalities. Before we begin our main example of building a RAG (Retrieval Augmented Generation), use the section below to test the NVIDIA key required for the rest of the application.\n",
    "\n",
    "[**Author: Alex Salgado**\n",
    "](https://www.elastic.co/search-labs/author/alex-salgado)\n",
    "\n",
    "> **Note:** This is the code format of the original Elastic blog titled \"Using NVIDIA NIM with Elasticsearch vector store.\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "_uV6ZhR5Du4N",
   "metadata": {
    "id": "_uV6ZhR5Du4N"
   },
   "source": [
    "\n",
    "## Introduction\n",
    "\n",
    "This blog explores how to integrating Elasticsearch and [NVIDIA NIM](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/), using advanced LLM models, significantly enhances applications by providing superior natural language processing capabilities. Built on NVIDIA's software stack, which includes CUDA and TensorRT, NVIDIA NIM offers features such as in-flight batching. This technique not only speeds up request processing by handling multiple iterations of execution simultaneously but also integrates seamlessly with Elasticsearch to enhance data indexing and search functionalities.\n",
    "Before we begin our main example of building a RAG (Retrieval Augmented Generation), use the section below to test the NVIDIA key required for the rest of the application.\n",
    "\n",
    "### NVIDIA NIM - A general explanation before you start\n",
    "To expedite the development of your application prototypes, consider using the NVIDIA NIM endpoints hosted in the cloud, specially designed to simplify and accelerate this process. At [nvidia.ai](https://www.nvidia.com/en-us/ai), you can explore and experiment with various LLM models through API calls to the NVIDIA NIM microservices.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6htwrIYrzexq",
   "metadata": {
    "id": "6htwrIYrzexq"
   },
   "source": [
    "\n",
    "\n",
    "#### NVIDIA API KEY\n",
    "\n",
    "To create a code to interact with NIM microservices, as in the example below, we utilize a API-KEY retrieved from the environment variable `os.environ['NVIDIA_API_KEY']`.  To acquire this key, you must first establish a developer account on NVIDIA's platform. Once you have logged in, proceed to the designated webpage to obtain your API key. This process is visually detailed in the illustration below for your convenience. For details [click here](https://build.nvidia.com/google/gemma-7b).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ZUM4t9GFD59V",
   "metadata": {
    "id": "ZUM4t9GFD59V"
   },
   "source": [
    "\n",
    "\n",
    "In a general way, using the assistant, you can copy and adjust the code and run it in your environment as follows:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "HfYFU-vWEFF1",
   "metadata": {
    "id": "HfYFU-vWEFF1"
   },
   "outputs": [],
   "source": [
    "!pip install openai\n",
    "\n",
    "from openai import OpenAI\n",
    "\n",
    "client = OpenAI(\n",
    "    base_url=\"https://integrate.api.nvidia.com/v1\", api_key=os.environ[\"NVIDIA_API_KEY\"]\n",
    ")\n",
    "\n",
    "completion = client.chat.completions.create(\n",
    "    model=\"google/codegemma-7b\",\n",
    "    messages=[\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": \"Elasticsearch command to view the fields of the index 01-rio-meetup:\\n\\n\",\n",
    "        }\n",
    "    ],\n",
    "    temperature=0.5,\n",
    "    top_p=1,\n",
    "    max_tokens=1024,\n",
    "    stream=True,\n",
    ")\n",
    "\n",
    "\n",
    "for chunk in completion:\n",
    "    if chunk.choices[0].delta.content is not None:\n",
    "        print(chunk.choices[0].delta.content, end=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "_2lWdM_fFSO7",
   "metadata": {
    "id": "_2lWdM_fFSO7"
   },
   "source": [
    "\n",
    "In this way, we execute a call to the `google/codegemma-7b` LLM model within the framework of NVIDIA's cloud through the available microservice of NVIDIA NIM. It's that simple.\n",
    "This is a simple example, just using the NVIDIA API, but in this blog, we will show how we can interact with Elasticsearch in a RAG application.\n",
    "\n",
    "## Integrating NVIDIA NIM with Elastic using LangChain: A Example with RAG\n",
    "\n",
    "What if we could combine the power of Elasticsearch's vector search to bring relevant information and send the context to an LLM call (running within NVIDIA's cloud), avoiding model hallucinations and using updated data from Elasticsearch? Then, let's set up our [RAG](https://www.elastic.co/what-is/retrieval-augmented-generation)?\n",
    "\n",
    "\n",
    "## Creating a Meetup Calculator\n",
    "\n",
    "Organizing a successful technical meetup requires meticulous planning and attention to detail. One of the biggest challenges is ensuring that the event meets the needs and expectations of all participants, providing a valuable and memorable experience. With this example, we will show how the integration of Elasticsearch, NVIDIA NIM, and generative AI can help in organizing a flawless technical meetup, from gathering information to generating customized reports.\n",
    "\n",
    "We will use Python in a Jupyter Notebook to demonstrate how to process and store registrations using Elasticsearch and NVIDIA NIM.\n",
    "\n",
    "### Initial Setup\n",
    "\n",
    "First, we set up our environment and install the necessary dependencies:\n",
    "\n",
    "You will need:\n",
    "- A Jupyter Notebook editor with Python, such as Google Colab.\n",
    "- Access to Elasticsearch. This example uses Elasticsearch version 8.13; if you are new, check out our [Quick Start on Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html).\n",
    "- Create an ELASTIC_API_KEY within Kibana. [Creating an API Key](https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key)\n",
    "- Create a Developer account on the NVIDIA website.\n",
    "- Obtain the NVIDIA_API_KEY within the NVIDIA Catalog explorer.\n",
    "\n",
    "Once this is done, create a new file in Jupyter Notebook and type the commands to install the necessary packages:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "mPgaRsjKFVZJ",
   "metadata": {
    "id": "mPgaRsjKFVZJ"
   },
   "outputs": [],
   "source": [
    "!pip install langchain_nvidia_ai_endpoints\n",
    "!pip install langchain-community langchain-text-splitters langchain\n",
    "!pip install -q -U langchain-elasticsearch python-dotenv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "t3FbKckQFgqP",
   "metadata": {
    "id": "t3FbKckQFgqP"
   },
   "source": [
    "NVIDIA provides an integration with LangChain through the `langchain-nvidia-ai-endpoints package`, which connects users to optimized artificial intelligence models, such as Mixtral 8x7B and Llama 2, available in the NGC catalog. With this, the models can be quickly customized and implemented with enterprise support through NVIDIA AI Enterprise. LangChain provides direct support for these models, enabling easy access and use in various applications, after the initial setup of an account in NGC and the installation of the package via pip.\n",
    "\n",
    "Import the libraries.\n",
    "  > **Note:** Here, we can see that we will make extensive use of LangChain.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "TKTm4iGFFkyj",
   "metadata": {
    "id": "TKTm4iGFFkyj"
   },
   "outputs": [],
   "source": [
    "from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings, ChatNVIDIA\n",
    "from langchain_elasticsearch import ElasticsearchStore\n",
    "from langchain.text_splitter import CharacterTextSplitter\n",
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "\n",
    "from urllib.request import urlopen\n",
    "import json"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "IaAGbR3FFrAs",
   "metadata": {
    "id": "IaAGbR3FFrAs"
   },
   "source": [
    "### Environment Variables\n",
    "\n",
    "We need to create some environment variables to store values of API keys and connections that we do not want to expose in the code.\n",
    "\n",
    "Feel free to program in your own way. I used a file called `env_` and placed it in my Google Drive for convenience. This way, I access the file `(env_path = '/content/drive/MyDrive/@Blogs/05-NVIDIA-NIM/env_')` which will set the values in the command `load_dotenv(env_path)`.\n",
    "\n",
    "Here is an example of the format of my `env_` file:\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ZrIGBxzuFwzA",
   "metadata": {
    "id": "ZrIGBxzuFwzA"
   },
   "outputs": [],
   "source": [
    "CLOUD_PASS = \"your-elastic-password\"\n",
    "CLOUD_ID = \"your-elastic-cloud-id\"\n",
    "CLOUD_USER = \"elastic\"\n",
    "ELASTIC_API_KEY = \"your-key-value\"\n",
    "\n",
    "NVIDIA_API_KEY = \"your-key-value\"\n",
    "OPENAI_API_KEY = \"your-key-value\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "w9lz4QbhF1fO",
   "metadata": {
    "id": "w9lz4QbhF1fO"
   },
   "source": [
    "Following the code, we need to grant access to Google Drive to access the file that sets up the environment variables. For this, you must be logged into a Google account.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "TNyO73XkF6yk",
   "metadata": {
    "id": "TNyO73XkF6yk"
   },
   "outputs": [],
   "source": [
    "# Mount /content/drive\n",
    "from google.colab import drive\n",
    "\n",
    "drive.mount(\"/content/drive\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "l5sdDHxnGI2h",
   "metadata": {
    "id": "l5sdDHxnGI2h"
   },
   "source": [
    "Now we will access the file and configure the values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "Agrdqt1JGOq6",
   "metadata": {
    "id": "Agrdqt1JGOq6"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "# Replace 'path/to/your/.env' with the correct path to your .env file on Google Drive\n",
    "env_path = \"/content/drive/MyDrive/@Blogs/05-NVIDIA-NIM/env_\"\n",
    "load_dotenv(env_path)\n",
    "\n",
    "# Elastic cloud credentials\n",
    "es_cloud_id = os.getenv(\"CLOUD_ID\")\n",
    "es_user = os.getenv(\"CLOUD_USER\")\n",
    "es_pass = os.getenv(\"CLOUD_PASS\")\n",
    "\n",
    "ELASTIC_API_KEY = os.getenv(\"ELASTIC_API_KEY\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "BRLkJswdGU5j",
   "metadata": {
    "id": "BRLkJswdGU5j"
   },
   "source": [
    "Let's also set the name of the index that will be created in Elasticsearch. In our case, it will be a meetup in the city of Rio de Janeiro.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "Q7ghqX5CGb8g",
   "metadata": {
    "id": "Q7ghqX5CGb8g"
   },
   "outputs": [],
   "source": [
    "elastic_index_name = \"01-rio-meetup\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3MxNJy4aGf7j",
   "metadata": {
    "id": "3MxNJy4aGf7j"
   },
   "source": [
    "As mentioned earlier, to connect to NVIDIA NIM, you need the API key provided."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8NXTDPc0GkP3",
   "metadata": {
    "id": "8NXTDPc0GkP3"
   },
   "outputs": [],
   "source": [
    "os.environ[\"NVIDIA_API_KEY\"] = os.getenv(\"NVIDIA_API_KEY\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eYaIcZolGpbA",
   "metadata": {
    "id": "eYaIcZolGpbA"
   },
   "source": [
    "There you go, all environment variables and keys are ready to be used.\n",
    "\n",
    "### Meetup Scenario:\n",
    "\n",
    "Imagine a technical meetup with a maximum of 20 participants, held in the evening at a coworking space. After two talks, there will be a coffee break for networking. To meet everyone's needs, participants will be asked about dietary restrictions, mobility/accessibility difficulties, and beverage preferences (juice, water, beer, soda) during registration.\n",
    "\n",
    "We could store participant information in Elasticsearch as registrations are made in a registration system, but in our example, we will get this data from a JSON file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "wae9X1iVGrhg",
   "metadata": {
    "id": "wae9X1iVGrhg"
   },
   "outputs": [],
   "source": [
    "# Download the dataset\n",
    "\n",
    "url = \"https://raw.githubusercontent.com/salgado/public-dataset/main/meetup-participants.json\"\n",
    "\n",
    "response = urlopen(url)\n",
    "workplace_docs = json.loads(response.read())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06bIf9DeG1om",
   "metadata": {
    "id": "06bIf9DeG1om"
   },
   "source": [
    "Split Documents into Passages (Chunking)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "OJpxTtjmG0SV",
   "metadata": {
    "id": "OJpxTtjmG0SV"
   },
   "outputs": [],
   "source": [
    "metadata = []\n",
    "content = []\n",
    "\n",
    "for doc in workplace_docs:\n",
    "    content.append(doc[\"observation\"])\n",
    "    metadata.append({\"name\": doc[\"name\"], \"observation\": doc[\"observation\"]})\n",
    "\n",
    "text_splitter = CharacterTextSplitter(chunk_size=50, chunk_overlap=0)\n",
    "docs = text_splitter.create_documents(content, metadatas=metadata)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "vhgvm_iYG_SD",
   "metadata": {
    "id": "vhgvm_iYG_SD"
   },
   "source": [
    "Set embeddings of the chunks using Langchain predefine NVIDIA functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8IbisnloG_fh",
   "metadata": {
    "id": "8IbisnloG_fh"
   },
   "outputs": [],
   "source": [
    "query_embedding = NVIDIAEmbeddings()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "lsHgpYeuHJ82",
   "metadata": {
    "id": "lsHgpYeuHJ82"
   },
   "source": [
    "The `query_embedding` object created by `NVIDIAEmbeddings()` is used to convert text into numerical vectors (generation of embeddings). These vectors are then used for operations such as semantic searches.\n",
    "\n",
    "### Store in Elasticsearch vector database"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "OrWZeQjQHKZL",
   "metadata": {
    "id": "OrWZeQjQHKZL"
   },
   "outputs": [],
   "source": [
    "es = ElasticsearchStore.from_documents(\n",
    "    docs,\n",
    "    es_cloud_id=es_cloud_id,\n",
    "    es_api_key=ELASTIC_API_KEY,\n",
    "    index_name=elastic_index_name,\n",
    "    embedding=query_embedding,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "Z_ljxvXzHKpC",
   "metadata": {
    "id": "Z_ljxvXzHKpC"
   },
   "source": [
    "Let's create a retriever to collect data from event participants.\n",
    "> * Note: Remember that, to simplify, this index stores data for only 1 event."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "vCoMPM4LHK4M",
   "metadata": {
    "id": "vCoMPM4LHK4M"
   },
   "outputs": [],
   "source": [
    "retriever = es.as_retriever(search_kwargs={\"k\": 20})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fZPy_r18HLJL",
   "metadata": {
    "id": "fZPy_r18HLJL"
   },
   "source": [
    "### Using a model from NVIDIA NIM microservices\n",
    "\n",
    "One of the great advantages for developers when using NVIDIA NIM microservices is the ease of selecting from various available and scalable models, all in a single line of code.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "CpjApKa6HLW3",
   "metadata": {
    "id": "CpjApKa6HLW3"
   },
   "outputs": [],
   "source": [
    "model = ChatNVIDIA(model=\"mistral_7b\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7kDuQOtt5Uu",
   "metadata": {
    "id": "a7kDuQOtt5Uu"
   },
   "source": [
    "In the example we presented, we used the LLM model `mistral_7b`, developed by Mistral AI. This model is part of a set of over a dozen popular AI models supported by NVIDIA NIM microservices. The list includes models like `Mixtral 8x7B`, `Llama 70B`, `Stable Video Diffusion`, `Code Llama 70B`, `Kosmos-2`, among others. The NIM platform is designed to simplify the deployment of NVIDIA AI Foundation models and custom models.\n",
    "\n",
    "### Creating a RAG with Elastic using LangChain\n",
    "\n",
    "To answer questions about participants, we employed the RAG architecture, with relevant information stored in the Elasticsearch index and dense content in the vector field extracted from participant observation data.\n",
    "\n",
    "#### Response Chain Execution\n",
    "\n",
    "Now we are ready to ask the model to help us with organizing the meetup.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1msYKJISt5zF",
   "metadata": {
    "id": "1msYKJISt5zF"
   },
   "outputs": [],
   "source": [
    "template = \"\"\"Answer the question based only on the following context:\\n\n",
    "\n",
    "{context}\n",
    "\n",
    "Question: {question}\n",
    "\"\"\"\n",
    "prompt = ChatPromptTemplate.from_template(template)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "WUgfIU3jt6Db",
   "metadata": {
    "id": "WUgfIU3jt6Db"
   },
   "source": [
    "The code snippet presented defines a template to generate chat prompts, using placeholders to insert specific contexts and questions. The `template` variable stores a string that guides the response of a natural language model based solely on the provided context and question. This structure is converted into a `ChatPromptTemplate` object through the `from_template` function, allowing for dynamic creation of formatted prompts that guide the model's responses in a relevant and focused manner, facilitating precise interactions in applications using natural language processing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "oHmAPpvft6YT",
   "metadata": {
    "id": "oHmAPpvft6YT"
   },
   "outputs": [],
   "source": [
    "chain = (\n",
    "    {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
    "    | prompt\n",
    "    | model\n",
    "    | StrOutputParser()\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "NXC2l9CFt6oM",
   "metadata": {
    "id": "NXC2l9CFt6oM"
   },
   "source": [
    "Here, we set up a pipeline or transformation chain that starts with the retrieval prompt, contextualizing with data from Elasticsearch through the Reporting Observer object, passes through the LLM model, and ends with an output parser that converts the model's output into plain text. This chain is crucial for processing the question and generating the response in a readable format.\n",
    "This code snippet sets up a data processing pipeline to handle chat interactions, particularly in the context of integration with NVIDIA. Let's break down each part:\n",
    "* `{\"context\": retriever , \"question\": RunnablePassthrough()}`: Here, we define the initial structure of the pipeline. `retriever` is a variable representing the context that will be provided to the language model. `RunnablePassthrough()` is an object representing the question that will be asked to the model. In other words, this initial part of the pipeline prepares the context and question to be sent to the language model.\n",
    "* `| prompt`: This part of the code applies the `prompt` object, created from a template, to generate a formatted chat prompt based on the provided context and question. The generated prompt guides the language model to produce a relevant response.\n",
    "* `| model`: Here, the pipeline sends the formatted prompt to the language model, which processes the input and generates a response based on the provided information.\n",
    "* `| StrOutputParser()`: Finally, `StrOutputParser()` is used to process the output from the language model and format it as needed before presenting it as the final result. This may include text formatting, filtering out irrelevant information, among other adjustments.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "jiQf895Vt61b",
   "metadata": {
    "id": "jiQf895Vt61b"
   },
   "outputs": [],
   "source": [
    "prompt = \"\"\"Generate a menu of food and drink to be served, respecting dietary restrictions.\\n\n",
    "\n",
    "Generate a single order that can be placed on a food and drink delivery website.\n",
    "We will buy spicy sausage pizza, vegan pizza, and gluten-free Margherita pizza.\n",
    "For drinks, we will have Coca-Cola, water, beer, and grape juice.\n",
    "Calculate the quantity of the order so that each guest can eat 3 slices of pizza and drink 500ml of beverage.\n",
    "1 pizza contains 8 slices.\n",
    "\n",
    "Finally, Estimate the cost of this purchase based on average prices in the city of Rio de Janeiro.\n",
    "\n",
    "\"\"\"\n",
    "\n",
    "for s in chain.stream(prompt):\n",
    "    print(s, end=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "O7DoLkp2ucYO",
   "metadata": {
    "id": "O7DoLkp2ucYO"
   },
   "source": [
    "Finally, we define a specific question related to event planning and invoke the `chain` with this question. The system uses the RAG model to retrieve relevant information and generate a response, which is then printed.\n",
    "\n",
    "We could then ask other questions using the same chain, for example:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "kgeK82nPucrN",
   "metadata": {
    "id": "kgeK82nPucrN"
   },
   "outputs": [],
   "source": [
    "prompt = \"\"\"\n",
    "How many guests have dietary restrictions and what are they?\n",
    "\"\"\"\n",
    "\n",
    "for s in chain.stream(prompt):\n",
    "    print(s, end=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "-F-KlCFguc3J",
   "metadata": {
    "id": "-F-KlCFguc3J"
   },
   "source": [
    "  > **Note 1:** It is recommended to explore the use of other models and consider fine-tuning and other advanced techniques that go beyond the scope of this blog.\n",
    "\n",
    "  > **Note 2:** Accordingly with GOVERNING TERMS: Your use of NVIDIA NIM API is governed by the NVIDIA API Trial Service Terms of Use; and the use of this model is governed by the NVIDIA AI Foundation Models Community License and Mistral AI Terms of Use.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e38294ac",
   "metadata": {
    "id": "e38294ac"
   },
   "source": [
    "## Conclusion\n",
    "\n",
    "The integration of NVIDIA NIM with Elasticsearch allows users to leverage real-time analytics and complex search capabilities efficiently. By combining NVIDIA's LLM architectures with Elasticsearch's flexible and scalable search engine, organizations can achieve more responsive insights and search experiences. NVIDIA NIM's microservices architecture effortlessly scales to meet diverse demands and supports a wide range of applications, from enhancing chatbots with nuanced language understanding to providing real-time sentiment analysis and accurate language translation. To access these functionalities, we used the LangChain framework that integrates with NVIDIA NIM's microservices and Elasticsearch.\n",
    "\n",
    "What would you change to meet your needs??\n",
    "\n",
    "\n",
    "## References\n",
    "\n",
    "* [NVIDIA NIM Offers Optimized Inference Microservices for Deploying AI Models at Scale](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/)\n",
    "* [NVIDIA AI Enterprise Documentation](https://docs.nvidia.com/ai-enterprise/)\n",
    "* [NVIDIA AI](https://www.nvidia.com/en-us/ai/)\n",
    "* [Quick Start on Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html)\n",
    "* [Creating an API Key](https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key)\n",
    "* [What is retrieval augmented generation (RAG)?](https://www.elastic.co/what-is/retrieval-augmented-generation)\n"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
